Multi-pass sentence-end detection of lecture speech
نویسندگان
چکیده
Making speech recognition output readable is an important task. The first step here is automatic sentence end detection (SED). We introduce novel F0 derivative-based features and sentence end distance features for SED that yield significant improvements in slot error rate (SER) in a multi-pass framework. Three different SED approaches are compared on a spoken lecture task: hidden event language models, boosting, and conditional random fields (CRFs). Experiments on reference transcripts show that CRF-based models give best results. Inclusion of pause duration features yields an improvement of 11.1% in SER. The addition of the F0-derivative features gives a further reduction of 3.0% absolute, and an additional 0.5% is gained by use of backward distance features. In the absence of audio, the use of backward features alone yields 2.2% absolute reduction in SER.
منابع مشابه
بررسی وضوح گفتار کودکان فلج مغزی اسپاستیک 8 تا 12 ساله
Background and purpose: Speech intelligibility refers to how speech is understandable by listeners. This study examined speech intelligibility in children (Persian native speakers) with spastic cerebral palsy aged 8-12 years old. Materials and methods: A cross-sectional study was performed in 31dysarthric students (….. boys and …..girls) in Tehran, 2014. A list of w...
متن کاملOnline speaker adaptation and tracking for real-time speech recognition
This paper describes a low-latency online speaker adaptation framework. The main objective is to apply fast speaker adaptation to a real-time (RT) large vocabulary continuous speech recognition (LVCSR) engine. In this framework, speaker adaptation is performed on speaker turns generated by online speaker change detection and speaker clustering. To maximize long-term system performance, the adap...
متن کاملA multi-pass error detection and correction framework for Mandarin LVCSR
We previously proposed a multi-pass framework for Large Vocabulary Continuous Speech Recognition (LVCSR). The objective of this framework is to apply sophisticated linguistic models for recognition, while maintaining a balance between complexity and efficiency. The framework is composed of three passes: initial recognition, error detection and error correction. This paper presents and evaluates...
متن کاملSpeech Summarization using Weighte
This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spontaneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes redundant information, and its ...
متن کاملOpen-Domain Name Error Detection using a Multi-Task RNN
Out-of-vocabulary name errors in speech recognition create significant problems for downstream language processing, but the fact that they are rare poses challenges for automatic detection, particularly in an open-domain scenario. To address this problem, a multi-task recurrent neural network language model for sentence-level name detection is proposed for use in combination with out-of-vocabul...
متن کامل